Implement open telemetry integration - client/server and explict token latency #296

ppalucki · 2024-08-12T23:38:05Z

Description

Integrate OpenTelemetry (metrics only) and expose it using Promteheus exporter for:

Server (incomming connection) (request/reponse size, duration) with http_route label for each handler:
- http_server_duration_milliseconds (histogram: bucket/total/sum)
- http_server_request_size_bytes_total (counter)
- http_server_response_size_bytes_total (counter)
Client (all outgoing connection to services) with net_peer_name label to distinguish target:
- http_client_duration_milliseconds (histogram: bucket/total/sum)
- http_client_request_size_bytes_total (counter)
- http_client_response_size_bytes_total (counter)
Explict Float64 historgrams for first and all next tokens lantecy:
- llm_first_token_latency_milliseconds (histogram: bucket/total/sum)
- llm_next_token_latency_milliseconds (histogram: bucket/total/sum)
- llm_all_token_latency_milliseconds (histogram: bucket/total/sum)
- llm_pipeline_latency_milliseconds (histogram: bucket/total/sum)
Other default Go language metrics like: go_gc_duration_seconds, go_goroutines and others.

Additional feature is integrated opentelemetry trace context propagation (not tested).

Metrics are exposed on /metric with Prometheus handler. Example output for metrics:

1) server metrics example

# HELP http_server_duration_milliseconds Measures the duration of inbound HTTP requests.
# TYPE http_server_duration_milliseconds histogram
http_server_duration_milliseconds_bucket{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0",le="0"} 0
http_server_duration_milliseconds_bucket{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0",le="5"} 0
http_server_duration_milliseconds_bucket{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0",le="+Inf"} 2
http_server_duration_milliseconds_sum{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 25597.877786
http_server_duration_milliseconds_count{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 2
# HELP http_server_request_size_bytes_total Measures the size of HTTP request messages.
# TYPE http_server_request_size_bytes_total counter
http_server_request_size_bytes_total{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 202
# HELP http_server_response_size_bytes_total Measures the size of HTTP response messages.
# TYPE http_server_response_size_bytes_total counter
http_server_response_size_bytes_total{http_method="POST",http_route="/",http_scheme="http",http_status_code="200",net_host_name="127.0.0.1",net_host_port="8080",net_protocol_name="http",net_protocol_version="1.1",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 134

2) client metrics example

# HELP http_client_duration_milliseconds Measures the duration of outbound HTTP requests.
# TYPE http_client_duration_milliseconds histogram
http_client_duration_milliseconds_bucket{http_method="POST",http_status_code="200",net_peer_name="embedding-svc.chatqa.svc.cluster.local",net_peer_port="6000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0",le="5000"} 2
...
http_client_duration_milliseconds_bucket{http_method="POST",http_status_code="200",net_peer_name="llm-svc.chatqa.svc.cluster.local",net_peer_port="9000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0",le="0"} 0
...
# HELP http_client_request_size_bytes_total Measures the size of HTTP request messages.
# TYPE http_client_request_size_bytes_total counter
http_client_request_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="embedding-svc.chatqa.svc.cluster.local",net_peer_port="6000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 202
http_client_request_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="llm-svc.chatqa.svc.cluster.local",net_peer_port="9000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 3670
http_client_request_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="reranking-svc.chatqa.svc.cluster.local",net_peer_port="8000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 14076
http_client_request_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="retriever-svc.chatqa.svc.cluster.local",net_peer_port="7000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 19498
# HELP http_client_response_size_bytes_total Measures the size of HTTP response messages.
# TYPE http_client_response_size_bytes_total counter
http_client_response_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="embedding-svc.chatqa.svc.cluster.local",net_peer_port="6000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 19422
http_client_response_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="llm-svc.chatqa.svc.cluster.local",net_peer_port="9000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 480
http_client_response_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="reranking-svc.chatqa.svc.cluster.local",net_peer_port="8000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 3640
http_client_response_size_bytes_total{http_method="POST",http_status_code="200",net_peer_name="retriever-svc.chatqa.svc.cluster.local",net_peer_port="7000",otel_scope_name="go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp",otel_scope_version="0.49.0"} 13972

3) custom metrics example

# HELP llm_first_token_latency_milliseconds Measures the first token generate duration of.
# TYPE llm_first_token_latency_milliseconds histogram
llm_first_token_latency_milliseconds_bucket{otel_scope_name="entrag-telemetry",otel_scope_version="",le="1"} 0
llm_first_token_latency_milliseconds_bucket{otel_scope_name="entrag-telemetry",otel_scope_version="",le="64"} 0
...
llm_first_token_latency_milliseconds_bucket{otel_scope_name="entrag-telemetry",otel_scope_version="",le="+Inf"} 2
llm_first_token_latency_milliseconds_sum{otel_scope_name="entrag-telemetry",otel_scope_version=""} 9550.203300000001
llm_first_token_latency_milliseconds_count{otel_scope_name="entrag-telemetry",otel_scope_version=""} 2
# HELP llm_next_token_latency_milliseconds Measures the next token generate duration of.
# TYPE llm_next_token_latency_milliseconds histogram
llm_next_token_latency_milliseconds_bucket{otel_scope_name="entrag-telemetry",otel_scope_version="",le="1"} 4
llm_next_token_latency_milliseconds_bucket{otel_scope_name="entrag-telemetry",otel_scope_version="",le="64"} 4
...
llm_next_token_latency_milliseconds_bucket{otel_scope_name="entrag-telemetry",otel_scope_version="",le="+Inf"} 36
llm_next_token_latency_milliseconds_sum{otel_scope_name="entrag-telemetry",otel_scope_version=""} 14365.578998999998
llm_next_token_latency_milliseconds_count{otel_scope_name="entrag-telemetry",otel_scope_version=""} 36

4) Prometheus client default Go language metrics:

skl212 ➜  ~ curl -sL -N http://127.0.0.1:8080/metrics G go_ | grep -v HELP | grep -v TYPE
go_gc_duration_seconds{quantile="0"} 9.7515e-05
go_gc_duration_seconds{quantile="0.25"} 0.000102628
go_gc_duration_seconds{quantile="0.5"} 0.000134259
go_gc_duration_seconds{quantile="0.75"} 0.000141445
go_gc_duration_seconds{quantile="1"} 0.000191476
go_gc_duration_seconds_sum 0.00130944
go_gc_duration_seconds_count 10
go_goroutines 8
go_info{version="go1.22.4"} 1
go_memstats_alloc_bytes 2.795128e+06
....
go_memstats_sys_bytes 2.276456e+07
go_threads 23

Issues

n/a

Type of change

New feature (non-breaking change which adds new functionality)

Dependencies

open telemetry go sdk (+prometheus exporter)
- go.opentelemetry.io/otel
- go.opentelemetry.io/otel/exporters/prometheus
- go.opentelemetry.io/otel/metric
- go.opentelemetry.io/otel/sdk/metric
- go.opentelemetry.io/otel/metric
prometheus client:
- github.com/prometheus/client_golang/prometheus/promhttp
open telemetry go contrib http client/server intrumentation
- go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp

Tests

Manually using curl (examples show above) after running only with ChatQnA sample graph.

TODO:

better or new label and attributes (e.g. otel_scope_name...)
better buckets boundries for token latencies
reconfigure default bucket boundries for server/client latnecy metrics
test OTEL trace propagation
write unit tests
document
remove some comments with examples

zhlsunshine · 2024-08-13T03:23:27Z

microservices-connector/cmd/router/main.go

+	// bag := baggage.FromContext(ctx)
+	// span.AddEvent("handling this...", trace.WithAttributes(uk.String(bag.Member("username").Value())))
+
+	_, _ = io.WriteString(w, "Hello, world!\n")


Hi @ppalucki, what's the purpose of this response?

@zhlsunshine I accidently copied it from some working branch of streaming, later rebased - now it is removed.

zhlsunshine · 2024-08-13T03:24:28Z

microservices-connector/cmd/router/main.go

+	// span.AddEvent("handling this...", trace.WithAttributes(uk.String(bag.Member("username").Value())))
+
+	_, _ = io.WriteString(w, "Hello, world!\n")
+	ctx, cancel := context.WithTimeout(req.Context(), 3600*time.Minute)


Hi @ppalucki, may 1 hour time out be too long?

@zhlsunshine fixed (long timeout was used it for my debugging sessions)

zhlsunshine · 2024-08-13T03:42:19Z

Hi @ppalucki, thanks for your effort on this PR!
In mcGraphHandler :

responseBody, statusCode, err := routeStep(defaultNodeName, *mcGraph, inputBytes, inputBytes, req.Header)

I think there should be a measurement after routeStep returns the responseBody because this produce contains the duration for going through all microservice in the pipeline. And this information can calculate and evaluate the handling time for GMC router. It really is helpful for GMC router.

I think you also can support the measurement metrics in mcDataHandler which is focusing on data embedding and storage in VectorDB.

zhlsunshine · 2024-08-13T04:00:49Z

microservices-connector/cmd/router/main.go

+	//mux.Handle("/dataprep", otelhttp.NewHandler(http.HandlerFunc(mcDataHandler), "mcDataHandler"))
+
+	// Export metric using Prometheus client built-in handler.
+	mux.Handle("/metrics", promhttp.Handler())


Hi @ppalucki, is it possible to registry this router metrics via function handleFunc as well?

@zhlsunshine fixed (added)
yes, it is possible, initally I didn't want to include /metrics in stats

but you're right - it should be covered and excluded later in visualization/post processing

ppalucki · 2024-08-14T08:08:25Z

Hi @ppalucki, thanks for your effort on this PR! In mcGraphHandler :
responseBody, statusCode, err := routeStep(defaultNodeName, *mcGraph, inputBytes, inputBytes, req.Header)
I think there should be a measurement after routeStep returns the responseBody because this produce contains the duration for going through all microservice in the pipeline. And this information can calculate and evaluate the handling time for GMC router. It really is helpful for GMC router.

I think you also can support the measurement metrics in mcDataHandler which is focusing on data embedding and storage in VectorDB.

I quickly added:

all_token_latency - which is essenatialy sum of first and "all" next tokens,
pipeline_latency and what you asked for but still not sure that it covers what you wanted e.g. "user facing" latency of all steps in pipeline + network.

I need to test it more carefully.

here is example output:

llm_all_token_latency_milliseconds_sum{otel_scope_name="entrag-telemetry",otel_scope_version=""} 103565.76762
llm_first_token_latency_milliseconds_sum{otel_scope_name="entrag-telemetry",otel_scope_version=""} 38418.945988
llm_next_token_latency_milliseconds_sum{otel_scope_name="entrag-telemetry",otel_scope_version=""} 59319.08985300002
llm_pipeline_latency_milliseconds_sum{otel_scope_name="entrag-telemetry",otel_scope_version=""} 6454.966733

Small issues and also collect stats from /metrics router remove examples + add allTokenLatency metric Fix description of histograms pipeline latency histogram before token measurments

zhlsunshine · 2024-09-09T01:46:29Z

Hi @ppalucki, I think you may need to fix the DCO error by adding the Signed-off-by with your github id and email.

eero-t · 2024-09-30T12:45:22Z

microservices-connector/cmd/router/main.go

+	provider := sdkmetric.NewMeterProvider(sdkmetric.WithReader(exporter))
+	otel.SetMeterProvider(provider)
+
+	// ppalucki: Own metrics defintion bellow


Typos:

Suggested change

// ppalucki: Own metrics defintion bellow

// ppalucki: Own metrics definition below

nice catch!

eero-t · 2024-09-30T12:50:11Z

microservices-connector/cmd/router/main.go

+var (
+	firstTokenLatencyMeasure metric.Float64Histogram
+	nextTokenLatencyMeasure  metric.Float64Histogram
+	allTokenLatencyMeasure   metric.Float64Histogram
+	pipelineLatencyMeasure   metric.Float64Histogram
+)


IMHO this would look nicer as:

Suggested change

var (

firstTokenLatencyMeasure metric.Float64Histogram

nextTokenLatencyMeasure metric.Float64Histogram

allTokenLatencyMeasure metric.Float64Histogram

pipelineLatencyMeasure metric.Float64Histogram

)

type struct LatencyMeasureT {

firstToken, nextToken, allTokens, pipeline metric.Float64Histogram

}

var latencyMeasure latencyMeasureT

eero-t · 2024-09-30T12:57:34Z

microservices-connector/cmd/router/main.go

+	firstTokenLatencyMeasure, err = meter.Float64Histogram(
+		"llm.first.token.latency",
+		metric.WithUnit("ms"),
+		metric.WithDescription("Measures the duration of first token generation."),
+		api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364),
+	)
+	if err != nil {
+		log.Error(err, "metrics: cannot register first token histogram measure")
+	}
+	nextTokenLatencyMeasure, err = meter.Float64Histogram(
+		"llm.next.token.latency",
+		metric.WithUnit("ms"),
+		metric.WithDescription("Measures the duration of generating all but first tokens."),
+		api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364),
+	)
+	if err != nil {
+		log.Error(err, "metrics: cannot register next token histogram measure")
+	}
+
+	allTokenLatencyMeasure, err = meter.Float64Histogram(
+		"llm.all.token.latency",
+		metric.WithUnit("ms"),
+		metric.WithDescription("Measures the duration to generate response with all tokens."),
+		api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364),
+	)
+	if err != nil {
+		log.Error(err, "metrics: cannot register all token histogram measure")
+	}
+
+	pipelineLatencyMeasure, err = meter.Float64Histogram(
+		"llm.pipeline.latency",
+		metric.WithUnit("ms"),
+		metric.WithDescription("Measures the duration to going through pipeline steps until first token is being generated (including read data time from client)."),
+		api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364),
+	)
+	if err != nil {
+		log.Error(err, "metrics: cannot register pipeline histogram measure")
+	}


There's a bit of duplication. Maybe this could iterate over slice like (untested code):

metrics := struct { metric *metric.Float64Histogram name string desc string }[] = { { &latencyMeasure.firstToken, "llm.first.token.latency", "..." }, ... } for _, item := metrics { item.metric, err = meter.Float64Histogram( item.name, metric.WithUnit("ms"), metric.WithDescription(item.desc), api.WithExplicitBucketBoundaries(1, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16364), ) if err != nil { log.Errorf(err, "metrics: cannot register '%s' histogram measure", item.name) } }

nice catch!

eero-t · 2024-11-06T18:41:17Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement open telemetry integration - client/server and explict token latency #296

Implement open telemetry integration - client/server and explict token latency #296

ppalucki commented Aug 12, 2024 •

edited

Loading

zhlsunshine Aug 13, 2024

ppalucki Aug 14, 2024

zhlsunshine Aug 13, 2024

ppalucki Aug 14, 2024

zhlsunshine commented Aug 13, 2024

zhlsunshine Aug 13, 2024

ppalucki Aug 14, 2024 •

edited

Loading

ppalucki commented Aug 14, 2024 •

edited

Loading

zhlsunshine commented Sep 9, 2024

eero-t Sep 30, 2024

zhlsunshine Oct 8, 2024

eero-t Sep 30, 2024 •

edited

Loading

eero-t Sep 30, 2024

zhlsunshine Oct 8, 2024

eero-t commented Nov 6, 2024

	// ppalucki: Own metrics defintion bellow
	// ppalucki: Own metrics definition below

Implement open telemetry integration - client/server and explict token latency #296

Are you sure you want to change the base?

Implement open telemetry integration - client/server and explict token latency #296

Conversation

ppalucki commented Aug 12, 2024 • edited Loading

Description

1) server metrics example

2) client metrics example

3) custom metrics example

4) Prometheus client default Go language metrics:

Issues

Type of change

Dependencies

Tests

TODO:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zhlsunshine commented Aug 13, 2024

Choose a reason for hiding this comment

ppalucki Aug 14, 2024 • edited Loading

Choose a reason for hiding this comment

ppalucki commented Aug 14, 2024 • edited Loading

zhlsunshine commented Sep 9, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eero-t Sep 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eero-t commented Nov 6, 2024

ppalucki commented Aug 12, 2024 •

edited

Loading

ppalucki Aug 14, 2024 •

edited

Loading

ppalucki commented Aug 14, 2024 •

edited

Loading

eero-t Sep 30, 2024 •

edited

Loading